A deterministic gradient-based approach to avoid saddle points
نویسندگان
چکیده
Abstract Loss functions with a large number of saddle points are one the major obstacles for training modern machine learning (ML) models efficiently. First-order methods such as gradient descent (GD) usually choice ML models. However, these converge to certain choices initial guesses. In this paper, we propose modification recently proposed Laplacian smoothing (LSGD) [Osher et al., arXiv:1806.06317 ], called modified LSGD (mLSGD), and demonstrate its potential avoid without sacrificing convergence rate. Our analysis is based on attraction region, formed by all starting which considered numerical scheme converges point. We investigate region’s dimension both analytically numerically. For canonical class quadratic functions, show that region mLSGD $\lfloor (n-1)/2\rfloor$ , hence it significantly smaller than GD whose $n-1$ .
منابع مشابه
Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent
Nesterov's accelerated gradient descent (AGD), an instance of the general family of"momentum methods", provably achieves faster convergence rate than gradient descent (GD) in the convex setting. However, whether these methods are superior to GD in the nonconvex setting remains open. This paper studies a simple variant of AGD, and shows that it escapes saddle points and finds a second-order stat...
متن کاملGradient Descent Can Take Exponential Time to Escape Saddle Points
Although gradient descent (GD) almost always escapes saddle points asymptotically [Lee et al., 2016], this paper shows that even with fairly natural random initialization schemes and non-pathological functions, GD can be significantly slowed down by saddle points, taking exponential time to escape. On the other hand, gradient descent with perturbations [Ge et al., 2015, Jin et al., 2017] is not...
متن کاملFirst-order Methods Almost Always Avoid Saddle Points
We establish that first-order methods avoid saddle points for almost all initializations. Our results apply to a wide variety of first-order methods, including gradient descent, block coordinate descent, mirror descent and variants thereof. The connecting thread is that such algorithms can be studied from a dynamical systems perspective in which appropriate instantiations of the Stable Manifold...
متن کاملA Generic Approach for Escaping Saddle points
A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them ...
متن کاملA geometric approach to saddle points of surfaces
We outline an alternative approach to the geometric notion of a saddle point for real-valued functions of two real variables. It is argued that our treatment is more natural than the usual treatment of this topic in standard texts on calculus.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: European Journal of Applied Mathematics
سال: 2022
ISSN: ['0956-7925', '1469-4425']
DOI: https://doi.org/10.1017/s0956792522000316